Material and methods

RNA expraction and sequencing

Total RNA was isolated using the RNeasy Plant Mini Kit (QIAGEN, Germany) following the manufacturer’s protocol. To eliminate potential DNA contamination, the RNA was further treated with the TURBO DNA-free Kit (Thermo Fisher Scientific, USA) in a 50 µL reaction volume. Purification was performed using Agencourt RNA Clean XP beads (Beckman Coulter, USA). RNA concentration and integrity were evaluated using the Quantitative RiboGreen RNA Assay (Thermo Fisher Scientific) and the RNA 6000 Pico Kit (Agilent Technologies, USA). RNA libraries were constructed using the NEBNext Poly(A) mRNA Magnetic Isolation Module followed by the KAPA RNA Hyper Kit (Roche, Switzerland), adhering to the manufacturer’s instructions. Final library purification was carried out with KAPA HyperPure Beads (Roche, Switzerland). Library fragment size distribution and purity were assessed using the High Sensitivity DNA Kit (Agilent Technologies). Quantification was performed with the Quant-iT DNA Assay Kit, High Sensitivity (Thermo Fisher Scientific). Equimolar pools of each library (10 pM) were subjected to high-throughput sequencing on the Illumina HiSeq 2500 platform using paired-end reads (2 × 100 bp) with a 2% PhiX spike-in control.

Bioinformatics analysis

Initial quality assessment of murine transcriptomic data was performed using FastQC (v0.12.1), with aggregated reports generated via MultiQC (v1.27.1) [Ewels et al., 2016]. Raw reads were processed using fastp (v0.23.4) to remove low-quality sequences [Chen et al., 2018]. Read alignment was subsequently conducted with STAR (v2.7.11) [Dobin et al., 2013] against the Mus musculus reference genome (GENCODE GRCm39.vM36) using the annotation file gencode.vM36.chr_patch_hapl_scaff.annotation.gtf, followed by gene-level quantification with HTSeq-count (v2.0.5) [Anders et al., 2015]. Differential gene expression analysis between experimental groups was performed using DESeq2 (v1.44.0) [Love et al., 2014]. For comprehensive functional interpretation, we implemented gene-set enrichment analysis (GSEA) through fgsea (v1.31.6) [Korotkevich et al., 2016] and clusterProfiler (v4.12.6) [Wu et al., 2021], utilizing both the MSigDB [Castanza et al., 2023] and CellMarker 2.0 [Hu et al., 2023] databases. Immune cell enrichment patterns were further assessed via Gene Set Variation Analysis (GSVA) [Hänzelmann et al., 2013]. Immune cell populations were deconvoluted from tumor transcriptome profiles using the mMCP-counter algorithm [Petitprez et al., 2020]. Co-expression network analysis was performed using BioNERO (v1.12.0) [Almeida-Silva et al., 2022], employing signed hybrid network topology with Pearson correlation metrics. Potential batch effects were addressed using the removeBatchEffect function from the limma package [Ritchie et al., 2015]. Functional annotation of co-expressed gene modules included over-representation analysis against multiple pathway databases including MSigDB, KEGG [Kanehisa et al., 2025], Reactome [Milacic et al., 2024], and WikiPathways [Agrawal et al., 2024]. Protein-protein interaction networks were reconstructed using the STRING database v12.0 through the STRINGdb (v2.16.4) Bioconductor package [Szklarczyk et al., 2023]. Visualization of analytical results was achieved using a combination of pheatmap (v1.0.12) for heatmap generation, ggplot2 (v3.5.1) for general plotting, and EnhancedVolcano (v1.22.0) for specialized visualization of differential expression results. The entire analytical workflow was implemented in R (v4.4.2).

flowchart LR
    A[("Raw reads")] --> B{"fastp"}
    B --> C{"FastQC"} & E{"STAR"}
    C --> D{"MultiQC"}
    E --> F{"HTSeq"}
    F --> G[("Count table")]
    G --> I{"DESeq2"} & Z{"limma"} & W{"PCA"}
    Z --> H{"PCA"} & J{"BioNERO"}
    I --> K{"GSEA"}
    J --> L{"ORA"} & N{"STRING"}
    G --> V{"GSVA"}

References

  1. Ewels, Philip, et al. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32.19 (2016): 3047-3048.
  2. Chen, Shifu, et al. “fastp: an ultra-fast all-in-one FASTQ preprocessor.” Bioinformatics 34.17 (2018): i884-i890.
  3. Dobin, Alexander, et al. “STAR: ultrafast universal RNA-seq aligner.” Bioinformatics 29.1 (2013): 15-21.
  4. Anders, Simon, Paul Theodor Pyl, and Wolfgang Huber. “HTSeq—a Python framework to work with high-throughput sequencing data.” Bioinformatics 31.2 (2015): 166-169
  5. Love, Michael I., Wolfgang Huber, and Simon Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology 15 (2014): 1-21.
  6. Hu, Congxue, et al. “CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data.” Nucleic acids research 51.D1 (2023): D870-D876.
  7. Hänzelmann, Sonja, Robert Castelo, and Justin Guinney. “GSVA: gene set variation analysis for microarray and RNA-seq data.” BMC bioinformatics 14 (2013): 1-15.
  8. Petitprez, Florent, et al. “The murine Microenvironment Cell Population counter method to estimate abundance of tissue-infiltrating immune and stromal cell populations in murine samples using gene expression.” Genome medicine 12 (2020): 1-15.
  9. Almeida-Silva, Fabricio, and Thiago M. Venancio. “BioNERO: an all-in-one R/Bioconductor package for comprehensive and easy biological network reconstruction.” Functional & Integrative Genomics 22.1 (2022): 131-136.
  10. Ritchie, Matthew E., et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic acids research 43.7 (2015): e47-e47.
  11. Kanehisa, Minoru, et al. “KEGG: biological systems database as a model of the real world.” Nucleic Acids Research 53.D1 (2025): D672-D677.
  12. Milacic, Marija, et al. “The reactome pathway knowledgebase 2024.” Nucleic acids research 52.D1 (2024): D672-D678.
  13. Agrawal, Ayushi, et al. “WikiPathways 2024: next generation pathway database.” Nucleic acids research 52.D1 (2024): D679-D689.
  14. Szklarczyk, Damian, et al. “The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.” Nucleic acids research 51.D1 (2023): D638-D646.

Results

Experimental Design and Sample Overview

The analysis included a total of 14 transcriptomic samples representing three experimental groups: 1) melanoma (n=6); 2) melanoma supplemented with Bifidobacterium adolescentis 150 (n=6); and 3) melanoma supplemented with Lacticaseibacillus rhamnosus K32 (n=2) (Table 1). The study was conducted in two sequential replicates: the first replicate comprised only the melanoma and B. adolescentis-supplemented groups (n=8), while the second replicate incorporated all three experimental conditions (n=6). The genomes of the bacteria used in this experiment are available at the following links GCF_001010915.1 and GCF_000735255.1. For clarity in data interpretation, we introduced the following standardized codings:

Experimental groups:

  1. Melanoma –> M

  2. Melanoma with B. adolescentis supplement –> M_BIF

  3. Melanoma with L. rhamnosus supplement –> M_LAC

Experimental batch:

  1. First experiment –> batch_1

  2. Second experiment –> batch_2

Quality control

Following quality filtering with fastp, aggregated quality metrics were compiled using MultiQC to summarize the FastQC reports. As illustrated in Figure 1, the MultiQC quality profile indicates that the sequencing data maintained adequate overall quality. Post-filtering, approximately 198 million reads (mean ± SD: 14 ± 5 million reads per sample) were retained for downstream analysis, with an average of 16 ± 7% of reads per sample removed during quality control (see Table 2 for detailed filtering statistics). Transcript abundance was quantified using the HTSeq tool. Figure 3 displays the distribution of HTSeq counts across all experimental samples. The resulting gene expression count matrix was normalized using the median of ratios method, with normalized values presented in Table 4.

MiltiQC report

MultiQC full report Figure 1. Summary MultiQC report.

Reads filtering by quality statistic

Mapping reads to reference counts

Saving 5 x 7 in image

Figure 3. Barplot denoted HTSeq counts per experimantal sample.

Normalized counts table

Principal component analysis

Principal component analysis (PCA) was employed as a linear dimensional reduction technique for exploratory data analysis and visualization. Prior to PCA, the expression data underwent variance-stabilizing transformation (VST), which derives stable variance estimates from fitted dispersion-mean relationships. This transformation processes count data normalized by size factors, generating a matrix of approximately homeostatic values characterized by constant variance across the range of mean expression levels, while simultaneously accounting for library size differences. The transformed data were visualized through bi-dimensional projection of the first two principal components (Figure 4), with the proportion of explained variance for each component detailed in Figure 5. Permutation multivariate analysis of variance (PERMANOVA) revealed statistically significant associations between expression profiles and batch effects, while no significant relationships with experimental groups were detected.

Saving 5 x 4 in image

Figure 4. PCA visualization using regularized log transformation counts.
Saving 5 x 4 in image

Figure 5. The proportion of explained variance of each component.
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999

adonis2(formula = t(vsd) ~ batch + group, data = meta_data, permutations = 9999, method = "euclidean", by = "margin")
         Df SumOfSqs      R2      F Pr(>F)   
batch     1     9421 0.19069 3.7386 0.0086 **
group     2     7784 0.15756 1.5445 0.1325   
Residual 10    25199 0.51007                 
Total    13    49404 1.00000                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Differential expression analysis

Differential expression analysis was conducted using the DESeq2 package, employing the model formula ~ batch + group to account for batch-related dispersion effects. The analysis revealed positive log2 fold change (lfc) values associated with both B. adolescentis 150 and L. rhamnosus K32 supplementation groups, while negative values corresponded to the untreated melanoma control group. DESeq2 performs differential expression analysis by fitting generalized linear models to negative binomial distributed counts and evaluating linear contrasts between experimental groups. Our results demonstrated 3 significantly up-regulated and 16 down-regulated genes in the M_BIF group compared to the M control group. For the M_LAC comparison, 21 differentially expressed genes were identified relative to the M group, while no significant changes were observed in the reverse comparison. When directly comparing bacterial supplementation groups, 39 genes were associated with M_BIF and 116 with M_LAC. Differentially expressed genes were determined using stringent thresholds (adjusted p-value < 0.05, |lfc| > 1). Complete statistical results of the differential analysis are presented in Tables 4-6. Visual representation of the results through volcano plots (Figures 6-8) effectively illustrates the relationship between statistical significance (-log10 p-value) and magnitude of expression changes (lfc) across all comparisons.

M_BIF vs M

Figure 6. Volcano plot denoted differential expressed genes: M_BIF vs M.

M_LAC vs M

Figure 7. Volcano plot denoted differential expressed genes: M_LAC vs M.

M_BIF vs M_LAC

Figure 8. Volcano plot denoted differential expressed genes: M_BIF vs M_LAC.

Gene set enrichment analysis

Gene Set Enrichment Analysis (GSEA) was employed to evaluate whether predefined gene sets exhibited statistically significant and concordant differences between biological states (e.g., phenotypic groups). This computational approach was implemented using multiple databases, enabling comprehensive identification of biological functions associated with the experimental conditions. For this analysis, we specifically utilized the HALLMARK MSigDB gene sets. The complete GSEA results are presented in Figure 9 and Tables 7-9. To delineate cell-type-specific expression patterns, we performed marker analysis using the Mouse Cell Marker 2.0 database, with results visualized in Figure 10 and detailed in Tables 10-12. Complementary to this, Gene Set Variation Analysis (GSVA) was conducted to examine HALLMARK MSigDB gene sets in a sample-oriented manner (Figures 11-16). The investigation was extended to immune cell profiling through two complementary approaches: mMCP-counter was implemented for broad immune cell type quantification (Figure 17), while GSVA was specifically applied to assess critical immune parameters including M1/M2 macrophage polarization ratios and CD8+ T cell abundance (Figures 18-21).

MSigDb Hallmark

M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Figure 9

Figure 9. Heatmap denoted NES values of MSigDB Hallmark pathways linked to experimental groups.

Cell Marker 2.0

M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Figure 10

Figure 10. Heatmap denoted NES values of immune cell types linked to experimental groups.

Co-expression analysis

Gene co-expression analysis was performed to identify functionally related gene clusters and investigate their potential associations with phenotypic traits. Prior to analysis, technical batch effects were removed from the expression matrix using the removeBatchEffect function implemented in the limma package. PCA of the batch-corrected data showed similar sample clustering patterns to the uncorrected dataset (Figures 11 and 12), with PERMANOVA confirming the effective removal of batch effects while revealing no significant associations with experimental groups. The BioNERO package identified several co-expression modules showing differential associations with experimental conditions. Notably, the greenyellow and sienna3 modules demonstrated positive correlations with the untreated melanoma group, while the pink and royalblue modules showed negative associations (Figures 13-16, Tables 13 and 14). Functional characterization through over-representation analysis revealed significant biological pathways enriched in these modules (Tables 15 and 16). Hub gene analysis identified key regulatory elements within each module (Tables 17 and 18), with correlation networks visualized for selected modules (Figure 17). Additional network characterization using STRING database highlighted potential functional relationships among module genes (Figures 18 and 22) with clustering (Figures 19 and 23) and enrichment (Figures 20, 21, 24, and 25) analysis.

Batch correction

Saving 5 x 4 in image

Figure 11. PCA visualization using vst transformation batch corrected counts.
Saving 5 x 4 in image

Figure 12. The proportion of explained variance of each component.
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999

adonis2(formula = t(mat) ~ batch + group, data = meta_data, permutations = 9999, by = "margin")
         Df  SumOfSqs      R2      F Pr(>F)
batch     1 0.0001345 0.03397 0.4480 0.9468
group     2 0.0009222 0.23285 1.5355 0.1214
Residual 10 0.0030029 0.75823              
Total    13 0.0039604 1.00000              

Net construction

Saving 8 x 5 in image

Figure 13. Dengrogramm of genes and modules.
Saving 8 x 6 in image

Figure 14. Number of genes per module.

Figure 26

Co-expressed modules linked to M group

Figure 15. Co-expression module association with experimental groups.
Saving 6 x 12 in image

Figure 16. Expression profiles of modules linked to experimental groups.

Over-representation analysis

Hub genes analysis and visualization

Figure 17. Corelation network of co-expressed genes modules positively and negatively linked to M group.

STRING analysis of modules

Figure 18

Figure 18 - STRING website

Figure 18. STRING database network plot obtained using DEG positively linked to M group.

Figure 19

Figure 19. Cluster analysis by k-means algorithm using genes linked to M-positive co-expression modules.

Figure 20

Figure 20. STRING enrichment analysis obtained using genes linked to M-positive co-expression modules and KEGG database.

Figure 21

Figure 21. STRING enrichment analysis obtained using genes linked to M-positive co-expression modules and Reactome database.

Figure 22

Figure 22 - STRING website

Figure 22. STRING database network plot obtained using DEG negatively linked to M group.

Figure 23

Figure 23. Cluster analysis by MCL algorithm using genes linked to M-negative co-expression modules.

Figure 24

Figure 24. STRING enrichment analysis obtained using genes linked to M-negative co-expression modules and Gene Ontology: Cellalur Component database.

Figure 25

Figure 25. STRING enrichment analysis obtained using genes linked to M-negative co-expression modules and Tissues expression database.